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Abstract 

This  report  describes  progress  toward  constructing  a  unified,  computer-based  model 
of  all  the  major  phenomena  in  cognitive  skill  acquisition.  An  extensive  review  of  the 
literature  was  completed  and  published,  along  with  several  new  analyses  of  important 
but  hitherto  neglected  aspects  of  cognitive  skill.  These  begin  to  artKulate  a  new  and 
exceedingly  simple  theory  of  cognitive  skill.  However,  the  computational  embodiment 
of  the  theory  was  only  partially  implemented  during  the  15  months  of  funding,  and 
more  work  is  needed. 


1  Objectives 

It  is  useful  and  traditional  to  view  cognitive  skiU  acquisition  as  having  three  ph^^es.  The 
early  phase  consists  of  studying  expository  material,  such  as  a  text  or  a  lecture,  that  teaches 
the  basic  principles  and  procedures  of  the  skiU.  The  zntermediate  phase  initiates  learning 
how  to  put  these  basic  ideas  into  practice  by  solving  problems  with  them.  The  mam 
difference  between  the  early  and  intermediate  phases  is  that  during  the  intermediate  phase 
the  student  focusses  mainly  on  solving  a  problem  or  explaining  an  example  with  ocassiond 
interruptions  to  refer  to  the  text,  ask  a  question  or  reflect  on  the  task,  whereas  during 
the  early  phase,  the  student  focusses  mainly  on  understanding  the  text  or  teacher  and  is 
not  actively  involved  in  trying  to  solve  a  problem  or  study  an  example.  The  late  p  ase 
begins  when  the  student  can  turn  out  error-free  solutions  on  a  regular  basis,  thus  indicating 
that  they  have  mastered  the  conceptual  material  of  the  task  domain.  They  may  stiU  make 
unintentional  errors  (slips).  The  frequency  of  sUps  and  the  time  to  to  a  task  decrease  slowly 

with  practice  during  late  phase.  ,  ^  4.  r'.c..  vnr 

The  objective  of  this  grant  was  to  extend  C'.\SCADE  to  cover  late-phase  effects.  C.  -  - . 
was  developed  under  an  earUer  ONR  grant  as  a  model  of  the  intermediate  phase  of  cognitive 
skill  acquisition,  and  of  self-explanation  effect  in  particular  (VanLehn  et  al.,  1992;  VanLehn 
and  Jones,  1993a).  The  basic  idea  was  to  add  a  simple  model  of  memory  and  demonstra  e 
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that  it  could  handle  all  the  major  phenomena  of  the  intermediate  and  late  phases.  These 
phenomena  would  be  demonstrated  by  modeHng  the  acquisition  of  physics  expertise. 

The  project  was  divided  into  four  subtasks.  Each  subtask  wiU  be  discussed  in  turn. 


2  Developing  an  architecture 


The  original  C.^scade  system  was  built  in  Prolog  with  no  explicit  model  of  memory.  It  had 
grown  quite  baroque  as  it  evolved,  and  we  doubted  that  we  could  add  a  model  of  memory 
in  any  simple  way.  Thus,  we  searched  for  an  architecture  (i.e.,  a  programming  language 
with  an  integrated  model  of  memory)  that  would  make  modeUng  physics  easy 

We  considered  Actr,  Soar,  Ops  and  several  other  off-the-shelf  architectures.  No 
were  adequate,  basically  because  they  used  an  attribute- value  representation  and  matching, 
whereas  physics  is  most  simply  represented  with  a  clausal  representation  and  unification. 
Thus,  we  developed  a  production  system  used  Prolog-like  clauses  as  its  working  memory 
elements  and  unification  instead  of  matching  in  the  production  system  interpreter.  With 
John  Anderson’s  blessing,  we  copied  the  production  strengthening  code  from  Actr. 

In  the  course  of  this  development,  we  spent  considerable  time  evaluating  the  evidence 
for  the  procedural-declarative  distinction.  Leaving  neurological  evidence  aside,  there  are 
basically  two  main  pieces  of  evidence  for  the  distinction.  One  is  that  the  learning  curves 
for  individual  production  rules  fit  a  power  law  beautifully,  except  for  the  first  t^al,  which 
takes  considerably  longer  than  it  should  (Anderson  et  al.,  1989;  Bovmr  et  al  1990).  Th 
extra  latency  has  been  taken  as  evidence  of  a  process  that  converts  declarative  to  procedural 
knowledge.  However,  more  careful  experimental  work  indicates  that  this  extra  latency  isn 

really  there  (Anderson  and  Fincham,  1994). 

The  second  type  of  evidence  is  the  use-specificity  of  transfer.  Practice  up  to  a  certain 

point  (several  dozen  trials)  causes  a  reduction  in  time-to-mastery  of  the  transfer  task,  bu 
practice  after  than  point  causes  no  further  time  savings.  The  initi^  practice  is  taken  as 
strengthening  declarative  knowledge,  which  is  shared  between  the  tasks,  and  thus  the  initial 
practice  benefits  the  transfer  task.  The  later  practice  is  taken  as  strengthening  only  the 
procedural  knowledge,  which  is  not  shared  between  the  tasks,  and  thus  the  later  practice 
does  not  benefit  the  transfer  task  (Singley  and  Anderson,  1989).  Examining  protocols  of 
learning  during  the  first  dozen  or  so  uses  of  a  rule  (VanLehn,  1995d)  convinced  us  that 
there  is  a  conversion  of  knowledge  going  on,  but  it  is  conversion  of  some  vague  poorly 
understood  general  instructions  into  a  complete,  fuUy  debugged,  operational  procedure  or 
solving  problems.  This  is  an  entirely  conscious  process.  One  can  hear  the  learning  events  in 
protocols  as  subjects  debug  their  skiU.  The  initial  debugging  transfers  to  new  tasks  because 
students  are  primarily  concerned  with  understanding  parts  of  the  instructional  material  that 
the  they  didn’t  understand  the  first  time  they  read  it.  The  later  practice  does  not  transfer 
because  it  mostly  results  in  speed-up  of  the  task-specific  procedure  that  the  subjects  have 
constructed,  and  removal  of  a  few  bugs  that  are  specific  to  that  procedure.  Thus,  we  think 
that  there  is  a  shift  from  a  superficial  understanding  of  the  instruction  to  an  operationa 
understanding,  but  both  forms  of  knowledge  are  consciously  accessible  and  the  conversion  is 
a  deliberate  process.  In  contrast,  the  Actr  position  is  that  the  operational  form  of  knowledge 
is  not  consciously  accessible  and  the  conversion  is  automatic. 

We  also  ,?pent  considerable  time  pondering  a  second  distinction,  which  is  the  difference 
between  rule-based  and  schema-based  (or  case-based)  architectures.  The  difference  is  pri¬ 
marily  one  of  the  size  of  the  unit  of  knowledge  retrieved.  Experts  do  seem  to  retrieve  large 
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pieces  of  knowledge  all  at  once,  but  novice  retrieve  knowledge  in  small  pieces^  The  schema- 
Lsed  architectures  can  model  experts,  but  they  have  a  difficulty  time  modeling  novices. 
The  rule-based  architectures  are  good  for  modehng  novices.  Given  some  °^^^hanism  of  as¬ 
sociation  (e.g.,  spreading  activation),  they  should  be  able  to  handle  schema  effects  as 
Although  this  has  not  been  demonstrated  computationally,  it  seemed  worth  rymg. 

In  short,  re-examining  the  literature  on  cognitive  architectures  convinced  us  that  a  good 
old-fashioned  production  system  would  work  best  as  an  architecture  for  the  new  C  asc.ade. 
However,  the  architecture  would  be  non-standard  in  two  respects.  First,  it 
kind  of  strengthening  and  context  effect  parameters  associated  with  each  ru  e  These  woul 
get  updated  with  each  appUcation  of  the  rule,  and  would  affect  its  probabihty 
Second,  working  memory  elements  and  rules  were  treated  exactly  the  same  way.  ey 
the  same  kind  of  memory  parameters.  Learning  a  new  rule  is  done  by  creating  a  working 

memory  element  that  has  the  syntax  of  a  rule.  ,  .  i 

Although  the  architecture  sounds  simple,  its  implementation  turned  out  to  overly 
complex.  The  review  of  the  architecture  Uterature  went  on  simultaneously  with  the  de¬ 
velopment  of  the  implementation.  As  a  consequence,  the  implementation  embeds  some 
assumptions  that  were  once  thought  to  be  correct,  but  are  now  no  longer  bebeved.  How 
ever,  it  works,  so  we  have  not  dared  (yet)  to  throw  it  out  and  build  a  simpler  one  that 

conforms  to  present  beliefs. 

3  Review  of  the  cognitive  skill  acquisition  literature 

Given  the  goal  of  accounting  for  aU  the  major  findings  in  cognitive  skill  acquisition  it  was 
important  to  update  the  Hst  of  findings  from  the  one  presented  in  the  grant  proposal  which 
was  based  on  VanLehn  (1990).  The  resulting  review,  which  is  m  press  (VanLehn,  1995bj, 
concluded  that  there  are  four  basic  sets  of  findings: 

•  Practice/Transfer.  This  group  includes  research  on  the  power  law  of  practice,  autom¬ 
atization  and  the  identical  elements  theory  of  transfer.  The  most  recent  relevant  wor 
has  tried  to  understand  the  use-specificity  of  practice:  why  does  increasing  the  prac¬ 
tice  on  a  training  task  decreases  the  amount  of  transfer  to  a  transfer  task  (Anderson 
and  Fincham,  1994)?  Other,  less  relevant  work  has  tried  to  settle  basic  questions  o 
memory  architecture  using  practice  effects.  For  instance,  does  retrieval  get  easier  be¬ 
cause  memory  items  get  stronger,  or  there  are  more  dupUcate  copies  of  them  (Logan, 

1988)? 

.  Expert/Novice.  No  major  new  phenomena  have  been  discovered  since  the  1990  re¬ 
view,  but  Ericsson  has  proposed  a  good  unifying  framework  based  on  the  idea  that 
practice  causes  experts  to  adapt  their  knowledge  to  the  task  domain  (Ericsson  an 
Lehmann,  1996).  In  particular,  for  task  domain  such  as  physics  where  one  has  all  e 
information  needed  to  solve  a  problem  at  the  start,  experts  develop  the  abibty  to  plan 
solutions  mentaUy.  This  accounts  for  their  ability  to  classify  problems  on  the  basis 
of  the  problems’  solutions  rather  than  their  surface  features  (Chi  et  al.,  1981),  their 
changes  in  strategy  (or  lack  thereof)  (Larkin,  1983;  Priest  and  Lindsay,  1992),  and 
their  improved  memory  of  intermediate  states  (Chase  and  Simon,  1973). 

•  Good/Poor  Learners.  This  group  of  studies  includes  investigations  of  the  self-explanatio 
effect,  reflection  and  tutoring  techniques  such  as  feedback  and  mastery-based  advance- 
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merit.  This  is  mostly  new  work.  The  notion  of  a  learning  event  seems  crucial  for 
explaining  most  of  the  phenomena  (VanLehn,  1995b). 

.  Schema  acquisition.  This  group  of  studies,  most  of  which  are  fairly  recent,  invest!- 
gates  the  way  students  use  examples  to  help  them  solve  problems,  and  thereby  develop 
general  solution  techniques  (schemas).  This  Uterature  is  a  confusing  mess  until  one 
notices  that  learning  is  being  measured  in  two  ways.  In  the  majority  of  the  studies, 
the  examples  and  problems  are  isomorphic,  so  copying  a  solution  froin  the  examp  e 
to  the  problem  wiU  get  the  problem  right.  Success  thus  depends  mostly  on  retrieval 
and  generaUzation.  The  other  studies  use  problems  that  are  not  isomorphic  to  the 
examples.  They  generally  find  very  little  learning,  and  only  under  special  circum- 
stances  which  tend  to  promote  self- explanation  and  other  effective  studying  strategies 
(VanLehn,  1995b). 


As  it  turns  out,  the  key  to  understanding  the  more  recent  literature  was  conducting  a 
fine-grained  protocol  analysis  of  the  process  of  referring  to  examples  while  studying  problems 
(VanLehn,  1995a).  This  process,  which  is  often  caUed  analogical  problem  solving  (somewhat 
misleadingly,  I  might  add),  is  involved  in  the  schema  acquisition  studies,  but  those  studies 
used  only  outcome  measures  instead  of  protocol  analysis  or  other  process  measures. 

The  protocol  analyses  were  initially  conducted  just  because  subjects  often  use  analogical 
problem  solving  and  yet  C'.ASC.\DE  did  not  model  it,  or  at  least,  not  very  accurately  (  an- 
Lehn  and  Jones,  1993b).  However,  it  was  discovered  that  different  students  used  differen 
strategies  (or  policies)  for  when  to  refer  to  the  example  and  how  much  information  to  copy 
from  it.  The  Good  learners  tend  to  use  analogy  less  frequently  and  to  copy  less  materia 
than  the  Poor  learners,  who  tend  to  copy  the  whole  solution  instead  of  trying  to  solve  the 
problem  themselves.  This  observation  and  other  data  suggest  that  maximizing  one  s  use  of 
analogy  hurts  learning  and  minimizing  it  help  learning  (VanLehn,  1995c). 

At  any  rate,  this  review  of  the  Uterature  ended  up  breaking  new  ground  in  cognitive 
skill  acquisition,  and  as  a  consequence  took  considerably  longer  than  anticipated. 


4  Modeling  the  major  phenomena 

We  picked  the  expert-novice  phenomena  as  our  first  group  of  phenomena  for  modeling,  since 
we  had  access  to  the  raw  data  for  some  of  the  seminal  studies  in  this  area  (Chi  et  al.,  1981, 
Larkin,  1983).  Analysis  of  the  expert  protocols  led  us  to  the  same  hypothesis  as  Ericsson 
(albeit  independently):  experts  plan  absract  solutions  in  their  heads  and  novices  do  not. 
Koedinger  and  Anderson  (1990)  also  reached  the  same  conclusion  for  geometry  expertise. 

For  instance,  if  a  physics  expert  is  given  a  problem  that  requires  more  than  a  dozen 
equations  to  answer  algebraically,  the  expert  will  construct  an  abstract  plan  consisting 
of  only  one  or  two  major  equations  (selected  from  a  small  set,  including  Newton  s  aw, 
conservation  of  energy,  conservation  of  momentum  and  kinematics).  Constructing  this  plan 
seem  to  involve  search,  but  not  much  search,  as  such  plans  are  rather  short.  Once  the  plan 
is  constructed,  the  expert  implements  it  by  writing  the  major  equations  and  aU  the  minor 

equations  required  to  support  the  major  ones.  ■  v-  a-  > 

We  investigated  several  techniques  for  representing  abstraction,  starting  with  Koe  inger  s 
ideas  about  "diagram  configuration  schemas.  Unfortunately,  the  abstractions  that  physics 
experts  plan  with  are  equations,  and  these  can  be  apphed  in  many  ways,  depending  on  what 
quantities  are  sought  and  given,  what  objects  can  be  grouped  together  into  a  single  object. 
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and  what  time  intervals  are  relevant.  None  of  the  standard  representations  for  planning 
operators  had  the  flexibiUty  that  our  task  domain  demanded.  .  ,  ,  . 

Although  one  has  the  impression  that  an  expert  is  simply  applying  a  single  planning 
operator  each  time  they  say  something  Uke  “I’ll  use  Newton’s  second  law  to  get  the  acceler¬ 
ation,”  there  is  too  much  flexibility  in  the  appHcation  of  major  principles  to  represeiit  aU  the 
knowledge  in  a  single  structure.  We  found  it  necessary  to  use  multiple  rules  that  elaborate 
the  problem  description  with  possible  objects,  time  intervals,  forces,  energy  and  momenta. 
The  major  principles  are  then  appUed  to  this  elaborated  description.  We  couldn’t  find  a  way 
to  build  the  elaborative  inferencing  into  the  principles  themselves,  as  is  done  in  Koedinger  s 


model  and  other  planning  formalisms. 

Unexpectedly,  this  yields  a  much  simpler  account  of  the  expert-novice  shift.  We  h^ad 
thought  that  experts  could  plan  because  they  possessed  abstract  planning  operators  that 
novices  lacked,  and  thus  a  major  impact  of  learning  was  building  such  abstract  operators. 
Because  we  ended  up  not  using  such  operators,  we  have  an  even  simpler  account  of  learning. 
We  assume  that  both  experts  and  novices  have  the  algebraic  skill  to  apply  algebraic  prin¬ 
ciples  abstractly,  without  writing  them  down  or  worrying  about  the  mathematical  details. 
We  also  assume  that  both  experts  and  novices  know  the  rules  required  for  elaborating  the 
problem  descriptions  (e.g.,  that  a  string  exerts  a  tension  force;  that  two  abutting  blocks  can 
be  considered  a  single  objects,  etc.).  However,  the  experts’  elaborative  rules  are  so  strong 
that  they  can  apply  them  almost  effortlessly,  and  conclusions  produced  by  the  rules  are 
easily  retained  in  memory.  The  novices,  on  the  other  hand,  cannot  do  such  elaborations  in 
memory.  They  need  to  write  elaborations  down  (e.g.,  as  forces  on  a  free-body  diagram,  or 
candidate  equations).  Moreover,  they  are  Ukely  to  overlook  possible  elaborations^ To  put 
it  a  little  differently,  the  experts’  see  much  more  in  a  problem  than  the  novices  They  see 
compound  objects,  forces,  energies  and  other  entities  that  are  not  mentioned  in  the  problem 
statement.  Novices  have  to  work  to  “see”  these  entities,  and  even  then,  they  may  not  see 

The  implementation  of  this  account  progressed  as  far  as  building  a  problem  solver  that 
simulated  an  expert.  Moreover,  it  could  solve  problems  from  all  6  textbook  chapters  on 
mechanics,  whereas  the  original  version  of  CasC'.^DF.  was  Hmited  to  1  chapter.  This  work 
was  reported  in  Jonathan  Rubin’s  Masters  Thesis  (Rubin,  1994).  Unfortunately,  it  w^s  built 
on  top  of  an  older  version  of  the  C.\sc.VDE  architecture,  and  would  have  to  be  modified  in 
order  to  run  on  the  latest  version.  We  do  not  know  if  merely  turning  on  the  memory  model 
and  given  C.asc.ade  hundreds  of  hours  of  practice  would  cause  an  expert-novice  shift,  but 
it  seems  likely  that  it  would. 


5  Porting  the  learning  events  code 

The  last  task  of  the  project  was  to  import  the  learning  algorithms  used  in  the  original 
version  of  C.\.S(.'.\DE.  Cascade  had  three  such  algorithms:  Explanation-based  learning  ol 
correctness,  analogical  search  control  and  analogical  abduction  (VanLehn  and  Jones,  1993a). 
Subsequent  empirical  work  found  no  evidence  for  analogical  search  control  (VanLehn  and 
Jones,  1993bi  VanLehn,  1995a).  We  ported  the  code  for  explanation-based  learning,  but 
have  not  yet  ported  the  code  for  analogical  abduction. 
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6  Final  status 

With  only  15  months  of  funding,  we  were  able  to  complete  many  of  our  goals  but  not  aU 
of  them.  We  did  derive  a  theory  of  cognitive  skill  acquisition  that  covers  major  phenomena 
in  both  the  intermediate  and  late  phases  of  skiU  acquisition.  This  theory  is  best  expressed 
in  a  review  article  (VanLehn,  1995b),  although  because  it  is  a  review  article,  the  theory  is 
hidden  between  the  Unes.  The  basic  assumptions  of  the  theory  are: 

•  AU  memory  items  have  the  same  status;  there  is  no  procedural/ declarative  distinction. 

•  The  only  automatic,  architecture  mechanism  are  the  traditional,  basic  ones  from  mem¬ 
ory  research:  strengthening  and  retrieval  cues  that  include  context. 

•  AU  learning  phenomena  that  are  not  due  to  strengthening  are  due  to  the  learners’ 
deUberate  reformulation  of  their  knowledge. 

•  Such  reformulations  are  manifested  as  learning  events,  wherein  learners  interrupt  their 
problem  solving  and  reflect  on  the  task  domain  knowledge  itself. 

This  theory  is  considerably  simpler  than  Act*,  Soar  and  other  theories.  Because  its  model 
of  memory  is  simpler,  it  is  unable  to  model  phenomena  that  only  show  up  in  brief  tasks 
(e.g.,  semantic  priming).  However,  its  simpUcity  may  aUow  it  to  model  more  phenomena 
at  the  molar  scale  that  is  most  interesting  to  educators  and  others  who  are  interesting  in 

understanding  learning  in  order  to  improve  training. 

Constructing  this  theory  involved  a  considerable  amount  of  protocol  analysis,  ihis 
was  not  anticipated  in  the  original  proposal.  However,  it  led  to  three  articles  on  the  na¬ 
ture  of  learning  events  and  analogical  problem  solving  (VanLehn,  1995a;  VanLehn,  199  , 

VanLehn,  1995c).  j  i  r  4.u  ir 

The  four  articles  just  cited,  along  with  a  fifth  one  comparing  our  model  of  the  sell- 

explanation  effect  to  2  other  models  (VanLehn  and  Jones,  1995)  and  Rubin’s  Master  s 
Thesis  (Rubin,  1994),  constitutes  the  main  pubUcations  produced  during  this  grant. 

We  developed  parts  of  a  computer  model  that  embodies  the  theory.  The  parts  include  a 
cognitive  architecture,  a  model  of  physics  expertise  and  a  version  of  the  original  C.  ASCADii 
EBLC  learning  mechanism.  Although  it  would  take  considerable  work  to  complete  the 
model,  the  graduate  students  and  programmer  who  implemented  the  model  have  all  left,  so 
development  on  the  model  has  halted  temporarily.  Nonetheless,  the  implementation  taught 
us  many  things  about  cognitive  skill  acquisition,  and  these  are  reflected  in  the  6  publications 

mentioned  above. 
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